Throttling I/O Streams to Accelerate File-IO Performance
نویسندگان
چکیده
To increase the scale and performance of scientific applications, scientists commonly distribute computation over multiple processors. Often without realizing it, file I/O is parallelized with the computation. An implication of this I/O parallelization is that multiple compute tasks are likely to concurrently access the I/O nodes of an HPC system. When a large number of I/O streams concurrently access an I/O node, I/O performance tends to degrade. In turn, this impacts application execution time. This paper presents experimental results that show that controlling the number of synchronous file-I/O streams that concurrently access an I/O node can enhance performance. We call this mechanism file-I/O stream throttling. The paper (1) describes this mechanism and demonstrates how it can be applied either at the application or system software layers, and (2) presents results of experiments driven by the cosmology application benchmark MADbench, executed on a variety of computing systems, that demonstrate the effectiveness of file-I/O stream throttling. The results imply that dynamic selection of the number of synchronous file-I/O streams that are allowed to access an I/O node can result in improved application performance. Note that the I/O pattern of MADbench resembles that of a large class of HPC applications.
منابع مشابه
I/O Throttling and Coordination for MapReduce
As a leading framework for data intensive computing, MapReduce has gained enormous popularity in large-scale data analysis. With the increasing adoption of multi/many core platform, more and more MapReduce tasks are now running on the same node and sharing the same storage resources. The concurrency of tasks raises the issue of I/O stream congestion. We have observed significant throughput drop...
متن کاملProfile-Guided File Partitioning on Beowulf Clusters
On cluster-based systems, data is typically stored on a centralized resource, and each node has a local disk used for the operating system and swap space. Although I/O middlewares (e.g., MPI-IO) and high performance I/O subsystems (e.g., RAID) can generate parallel I/O streams, disk contention and network latency still dominate I/O performance. To address this performance barrier, I/O access ne...
متن کاملSFIO: Safe/Fast String/File IO
This paper describes Sfio, a new input/output library, that can be used as a replacement for Stdio, the C language standard I/O library. Sfio is more complete, consistent, and efficient than Stdio. New facilities are provided for convenient, safe and efficient manipulation of data streams. An Sfio stream may be entirely memory resident or it may correspond to some actual file. Alternative I/O d...
متن کاملMPI/IO on DAFS over VIA: Implementation and Performance Evaluation
In this paper, we describe an implementation of MPI-IO on top of the Direct Access File System (DAFS) standard. The implementation is realized by porting ROMIO on top of DAFS. We identify one of the main mismatches between MPI-IO and DAFS is memory management. Three different design alternatives for memory management are proposed, implemented, and evaluated. We find that memory management in th...
متن کاملTowards a High Performance Implementation of MPI-IO on the Lustre File System
Abs tra ct—Lustre is becoming an increasingly important file system for large-scale computing clusters. The problem is that many dataintensive applications use MPI-IO for their I/O requirements, and it has been well documented that MPI-IO performs poorly in a Lustre file system environment. However, the reasons for such poor performance are not currently well understood. We believe that the pri...
متن کامل